AITopics | real-time transcription

Collaborating Authors

real-time transcription

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dialogue in Resonance: An Interactive Music Piece for Piano and Real-Time Automatic Transcription System

Bang, Hayeon, Kwon, Taegyun, Nam, Juhan

arXiv.org Artificial IntelligenceMay-23-2025

This paper presents , an interactive music piece for a human pianist and a computer-controlled piano that integrates real-time automatic music transcription into a score-driven framework. Unlike previous approaches that primarily focus on improvisation-based interactions, our work establishes a balanced framework that combines composed structure with dynamic interaction. Through real-time automatic transcription as its core mechanism, the computer interprets and responds to the human performer's input in real time, creating a musical dialogue that balances compositional intent with live interaction while incorporating elements of unpredictability. In this paper, we present the development process from composition to premiere performance, including technical implementation, rehearsal process, and performance considerations.

artificial intelligence, real time system, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2505.16259

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.61)

Add feedback

Evaluation of real-time transcriptions using end-to-end ASR models

Arriaga, Carlos, Pozo, Alejandro, Conde, Javier, Alonso, Alvaro

arXiv.org Artificial IntelligenceSep-11-2024

Automatic Speech Recognition (ASR) or Speech-to-text (STT) has greatly evolved in the last few years. Traditional architectures based on pipelines have been replaced by joint end-to-end (E2E) architectures that simplify and streamline the model training process. In addition, new AI training methods, such as weak-supervised learning have reduced the need for high-quality audio datasets for model training. However, despite all these advancements, little to no research has been done on real-time transcription. In real-time scenarios, the audio is not pre-recorded, and the input audio must be fragmented to be processed by the ASR systems. To achieve real-time requirements, these fragments must be as short as possible to reduce latency. However, audio cannot be split at any point as dividing an utterance into two separate fragments will generate an incorrect transcription. Also, shorter fragments provide less context for the ASR model. For this reason, it is necessary to design and test different splitting algorithms to optimize the quality and delay of the resulting transcription. In this paper, three audio splitting algorithms are evaluated with different ASR models to determine their impact on both the quality of the transcription and the end-to-end delay. The algorithms are fragmentation at fixed intervals, voice activity detection (VAD), and fragmentation with feedback. The results are compared to the performance of the same model, without audio fragmentation, to determine the effects of this division. The results show that VAD fragmentation provides the best quality with the highest delay, whereas fragmentation at fixed intervals provides the lowest quality and the lowest delay. The newly proposed feedback algorithm exchanges a 2-4% increase in WER for a reduction of 1.5-2s delay, respectively, to the VAD splitting.

algorithm, implementation, transcription, (15 more...)

arXiv.org Artificial Intelligence

2409.05674

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

How To Transcribe Streams of Audio Data in Real-Time with Python?

#artificialintelligenceMar-1-2022, 05:51:01 GMT

In my previous blog posts, I went through the AssemblyAI speech-to-text API. I tried its core transcription service and played with some of its cool AI-powered features: the content moderation feature that spots sensitive topics and the topic detection feature that extracts the subjects that are spoken about in each audio segment. You can check it out here. The code is also available on Github.) These experiments are all performed offline and take some time to run in order to generate the output.

assemblyai, real-time transcription, transcribe stream, (11 more...)

#artificialintelligence

Technology:

Information Technology > Architecture > Real Time Systems (0.96)
Information Technology > Artificial Intelligence (0.92)

Add feedback

Online Sequence Alignment for Real-Time Audio Transcription by Non-Experts

Lasecki, Walter S. (University of Rochester) | Miller, Christopher D. (University of Rochester) | Borrello, Donato (Univeristy of Rochester) | Bigham, Jeffrey P. (University of Rochester)

AAAI ConferencesJul-21-2012

Real-time transcription provides deaf and hard of hearing people visual access to spoken content, such as classroom instruction, and other live events. Currently, the only reliable source of real-time transcriptions are expensive, highly-trained experts who are able to keep up with speaking rates. Automatic speech recognition is cheaper but produces too many errors in realistic settings. We introduce a new approach in which partial captions from multiple non-experts are combined to produce a high-quality transcription in real-time. We demonstrate the potential of this approach with data collected from 20 non-expert captionists.

artificial intelligence, caption, real time system, (13 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > Monroe County > Rochester (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.52)
Education > Educational Setting (0.50)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.73)

Add feedback